{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Copyright 2019 Google LLC\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#     https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Retail Product Stockouts Prediction using AutoML Tables\n",
    "\n",
    "<table align=\"left\">\n",
    "  <td>\n",
    "    <a href=\"https://cloud.google.com/ml-engine/docs/tensorflow/getting-started-keras\">\n",
    "      <img src=\"https://cloud.google.com/_static/images/cloud/icons/favicons/onecloud/super_cloud.png\"\n",
    "           alt=\"Google Cloud logo\" width=\"32px\"> Read on cloud.google.com\n",
    "    </a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a href=\"\">\n",
    "      <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Colab logo\"> Run in Colab\n",
    "    </a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a href=\"\">\n",
    "      <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\">\n",
    "      View on GitHub\n",
    "    </a>\n",
    "  </td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "m26YhtBMvVWA"
   },
   "source": [
    "# Overview\n",
    "\n",
    "AutoML Tables enables you to build machine learning models based on tables of your own data and host them on Google Cloud for scalability. This Notebook demonstrates how you can use AutoML Tables to solve a product stockouts problem in the retail industry. This problem is solved using a binary classification approach, which predicts whether a particular product at a certain store will be out-of-stock or not in the next four weeks. Once the solution is built, you can plug this in with your production system and proactively predict stock-outs for your business."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "m26YhtBMvVWA"
   },
   "source": [
    "## Objective\n",
    "\n",
    "### Problem statement\n",
    "\n",
    "A stockout, or out-of-stock (OOS) event is an event that causes inventory to be exhausted. While out-of-stocks can occur along the entire supply chain, the most visible kind are retail out-of-stocks in the fast-moving consumer goods industry (e.g., sweets, diapers, fruits). Stockouts are the opposite of overstocks, where too much inventory is retained.\n",
    "\n",
    "### Impact\n",
    "\n",
    "According to a study by researchers Thomas Gruen and Daniel Corsten, the global average level of out-of-stocks within retail fast-moving consumer goods sector across developed economies was 8.3% in 2002. This means that shoppers would have a 42% chance of fulfilling a ten-item shopping list without encountering a stockout. Despite the initiatives designed to improve the collaboration of retailers and their suppliers, such as Efficient Consumer Response (ECR), and despite the increasing use of new technologies such as radio-frequency identification (RFID) and point-of-sale data analytics, this situation has improved little over the past decades.\n",
    "\n",
    "The biggest impacts being\n",
    "1. Customer dissatisfaction\n",
    "2. Loss of revenue\n",
    "\n",
    "### Machine Learning Solution\n",
    "\n",
    "Using machine learning to solve for stock-outs can help with store operations and thus prevent out-of-stock proactively.\n",
    "\n",
    "There are three big challenges any retailer would face as they try and solve this problem with machine learning:\n",
    "\n",
    "1. Data silos: Sales data, supply-chain data, inventory data, etc. may all be in silos. Such disjoint datasets could be a challenge to work with as a machine learning model tries to derive insights from all these data points. \n",
    "2. Missing Features: Features such as vendor location, weather conditions, etc. could add a lot of value to a machine learning algorithm to learn from. But such features are not always available and when building machine learning solutions we think for collecting features as an iterative approach to improving the machine learning model.\n",
    "3. Imbalanced dataset: Datasets for classification problems such as retail stock-out are traditionally very imbalanced with fewer cases for stock-out. Designing machine learning solutions by hand for such problems would be time consuming effort when your team should be focusing on collecting features.\n",
    "\n",
    "Hence, we recommend using AutoML Tables. With AutoML Tables you only need to work on acquiring all data and features, and AutoML Tables would do the rest. This is a one-click deploy to solving the problem of stock-out with machine learning."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "m26YhtBMvVWA"
   },
   "source": [
    "## Dataset\n",
    "\n",
    "In this solution, you will use two datasets: Training/Evaluation data and Batch Prediction inputs. To access the datasets in BigQuery, you need the following information. \n",
    "\n",
    "Training/Evaluation dataset: \n",
    "\n",
    "`Project ID: product-stockout`\n",
    "\n",
    "`Dataset ID: product_stockout`\n",
    "\n",
    "`Table ID: stockout`\n",
    "\n",
    "Batch Prediction inputs: \n",
    "\n",
    "`Project ID: product-stockout`\n",
    "\n",
    "`Dataset ID: product_stockout`\n",
    "\n",
    "`Table ID: batch_prediction_inputs`\n",
    "\n",
    "### Data Schema\n",
    "\n",
    "|Field name   \t|Datatype   \t|Type   \t|Description   \t|\n",
    "|---\t|---\t|---\t|---\t|\n",
    "|Item_Number |STRING |Identifier |This is the product/ item identifier |\n",
    "|Category |STRING |Identifier\t|Several items could belong to one category |\n",
    "|Vendor_Number\t|STRING\t|Identifier\t|Product vendor identifier |\n",
    "|Store_Number\t|STRING\t|Identifier\t|Store identifier |\n",
    "|Item_Description\t|STRING\t|Text Features\t|Item Description |\n",
    "|Category_Name\t|STRING\t|Text Features\t|Category Name |\n",
    "|Vendor_Name\t|STRING\t|Text Features\t|Vendor Name |\n",
    "|Store_Name\t|STRING\t|Text Features\t|Store Name |\n",
    "|Address\t|STRING\t|Text Features\t|Address |\n",
    "|City\t|STRING\t|Categorical Features\t|City |\n",
    "|Zip_Code\t|STRING\t|Categorical Features\t|Zip-code |\n",
    "|Store_Location\t|STRING\t|Categorical Features\t|Store Location |\n",
    "|County_Number\t|STRING\t|Categorical Features\t|County Number |\n",
    "|County\t|STRING\t|Categorical Features\t|County Name |\n",
    "|Weekly Sales Quantity |INTEGER\t|Time series data\t|52 columns for weekly sales quantity from week 1 to week 52 |\n",
    "|Weekly Sales Dollars\t|INTEGER\t|Time series data\t|52 columns for weekly sales dollars from week 1 to week 52 |\n",
    "|Inventory\t|FLOAT\t|Numeric Feature\t|This inventory is stocked by the retailer looking at past sales and seasonality of the product to meet demand for future sales. |\n",
    "|Stockout\t|INTEGER\t|Label\t|(1 - Stock-out, 0 - No stock-out) When the demand for four weeks future sales is not met by the inventory in stock we say we see a stock-out. This is because an early warning sign would help the retailer re-stock inventory with a lead time for the stock to be replenished. |\n",
    "\n",
    "\n",
    "To use AutoML Tables with BigQuery you do not need to download this dataset. However, if you would like to use AutoML Tables with GCS you may want to download this dataset and upload it into your GCP Project storage bucket. \n",
    "\n",
    "Instructions to download dataset: \n",
    "\n",
    "Sample Dataset: Download this dataset which contains sales data.\n",
    "\n",
    "1. [Link to training data](https://console.cloud.google.com/bigquery?folder=&organizationId=&project=product-stockout&p=product-stockout&d=product_stockout&t=stockout&page=table):  \n",
    "\n",
    "Dataset URI: <bq://product-stockout.product_stockout.stockout>\n",
    "\n",
    "2. [Link to data for batch predictions](https://console.cloud.google.com/bigquery?folder=&organizationId=&project=product-stockout&p=product-stockout&d=product_stockout&t=batch_prediction_inputs&page=table):  \n",
    "\n",
    "Dataset URI: <bq://product-stockout.product_stockout.batch_prediction_inputs>\n",
    "\n",
    "Upload this dataset to GCS or BigQuery (optional). \n",
    "\n",
    "You could select either [GCS](https://cloud.google.com/storage/) or [BigQuery](https://cloud.google.com/bigquery/) as the location of your choice to store the data for this challenge. \n",
    "\n",
    "1. Storing data on GCS: [Creating storage buckets, Uploading data to storage buckets](https://cloud.google.com/storage/docs/creating-buckets)\n",
    "2. Storing data on BigQuery: [Create and load data to BigQuery](https://cloud.google.com/bigquery/docs/quickstarts/quickstart-web-ui) (optional)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "b--5FDDwCG9C"
   },
   "source": [
    "## 1. Before you begin\n",
    "\n",
    "Follow the [AutoML Tables documentation](https://cloud.google.com/automl-tables/docs/) to:\n",
    "* Create a Google Cloud Platform (GCP) project and local development environment.\n",
    "* Enable billing.\n",
    "* Enable AutoML API.\n",
    "* Enter your project ID in the cell below. Then run the  cell to make sure the\n",
    "\n",
    "**If you are using Colab or AI Platform Notebooks**, your environment already meets\n",
    "all the requirements to run this notebook. You can skip this step from the AutoML Tables documentation\n",
    "\n",
    "Cloud SDK uses the right project for all the commands in this notebook.\n",
    "\n",
    "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "PROJECT_ID = \"<your-project-id>\" # @param {type:\"string\"}\n",
    "COMPUTE_REGION = \"us-central1\" # Currently only supported region.\n",
    "! gcloud config set project $PROJECT_ID"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "xZECt1oL429r"
   },
   "source": [
    "\n",
    "\n",
    "---\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This section runs initialization and authentication. It creates an authenticated session which is required for running any of the following sections."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "rstRPH9SyZj_"
   },
   "source": [
    "###  Authenticate your GCP account\n",
    "\n",
    "**If you are using AI Platform Notebooks**, your environment is already\n",
    "authenticated. Skip this step."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**If you are using Colab**, run the cell below and follow the instructions\n",
    "when prompted to authenticate your account via oAuth.\n",
    "\n",
    "**Otherwise**, follow these steps:\n",
    "\n",
    "1. In the GCP Console, go to the [**Create service account key**\n",
    "   page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n",
    "\n",
    "2. From the **Service account** drop-down list, select **New service account**.\n",
    "\n",
    "3. In the **Service account name** field, enter a name.\n",
    "\n",
    "4. From the **Role** drop-down list, select\n",
    "   **AutoML > AutoML Admin** and\n",
    "   **Storage > Storage Object Admin**.\n",
    "\n",
    "5. Click *Create*. A JSON file that contains your key downloads to your\n",
    "local environment.\n",
    "\n",
    "6. Enter the path to your service account key as the\n",
    "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "\n",
    "# If you are running this notebook in Colab, run this cell and follow the\n",
    "# instructions to authenticate your GCP account. This provides access to your\n",
    "# Cloud Storage bucket and lets you submit training jobs and prediction\n",
    "# requests.\n",
    "\n",
    "if 'google.colab' in sys.modules:    \n",
    "  from google.colab import files\n",
    "  keyfile_upload = files.upload()\n",
    "  keyfile = list(keyfile_upload.keys())[0]\n",
    "  %env GOOGLE_APPLICATION_CREDENTIALS $keyfile\n",
    "# If you are running this notebook locally, replace the string below with the\n",
    "# path to your service account key and run this cell to authenticate your GCP\n",
    "# account.\n",
    "else:\n",
    "  %env GOOGLE_APPLICATION_CREDENTIALS /path/to/service_account.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "BR0POq2UzE7e"
   },
   "source": [
    "### Install the client library\n",
    "Run the following cell to install the client libary using `pip`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install --quiet google-cloud-automl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Restart the kernel to allow automl_v1beta1 to be imported for Jupyter Notebooks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import HTML\n",
    "HTML(\"<script>Jupyter.notebook.kernel.restart()</script>\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import libraries and define constants\n",
    "\n",
    "First, import Python libraries required for training,\n",
    "The code example below demonstrates importing the AutoML Python API module into a python script. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# AutoML library\n",
    "from google.cloud import automl\n",
    "\n",
    "import google.cloud.automl_v1beta1.proto.data_types_pb2 as data_types\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "client = automl.TablesClient(project=PROJECT_ID, region=COMPUTE_REGION)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "s3F2xbEJdDvN"
   },
   "source": [
    "### Test the set up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "rUlBcZ3OfWcJ"
   },
   "source": [
    "To test whether your project set up and authentication steps were successful, run the following cell to list your datasets in this project.\n",
    "\n",
    "If no dataset has previously imported into AutoML Tables, you shall expect an empty return."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "colab": {},
    "colab_type": "code",
    "id": "sf32nKXIqYje"
   },
   "outputs": [],
   "source": [
    "#@title List datasets. { vertical-output: true }\n",
    "\n",
    "list_datasets = client.list_datasets()\n",
    "datasets = { dataset.display_name: dataset.name for dataset in list_datasets }\n",
    "datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "t9uE8MvMkOPd"
   },
   "source": [
    "You can also print the list of your models by running the following cell.\n",
    "\n",
    "If no model has previously trained using AutoML Tables, you shall expect an empty return."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "colab": {},
    "colab_type": "code",
    "id": "j4-bYRSWj7xk"
   },
   "outputs": [],
   "source": [
    "#@title List models. { vertical-output: true }\n",
    "\n",
    "list_models = client.list_models()\n",
    "models = { model.display_name: model.name for model in list_models }\n",
    "models"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "qozQWMnOu48y"
   },
   "source": [
    "\n",
    "\n",
    "---\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ODt86YuVDZzm"
   },
   "source": [
    "## 2. Import training data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "XwjZc9Q62Fm5"
   },
   "source": [
    "### Create dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "_JfZFGSceyE_"
   },
   "source": [
    "Select a dataset display name and pass your table source information to create a new dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "Z_JErW3cw-0J"
   },
   "outputs": [],
   "source": [
    "#@title Create dataset { vertical-output: true, output-height: 200 }\n",
    "\n",
    "dataset_display_name = 'stockout_data' #@param {type: 'string'}\n",
    "\n",
    "dataset = client.create_dataset(dataset_display_name)\n",
    "dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "35YZ9dy34VqJ"
   },
   "source": [
    "### Import data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "3c0o15gVREAw"
   },
   "source": [
    "You can import your data to AutoML Tables from GCS or BigQuery. For this solution, you will import data from a BigQuery Table. The URI for your table is in the format of `bq://PROJECT_ID.DATASET_ID.TABLE_ID`.\n",
    "\n",
    "The BigQuery Table used for demonstration purpose can be accessed as `bq://product-stockout.product_stockout.stockout`. \n",
    "\n",
    "See the table schema and dataset description from the README. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "FNVYfpoXJsNB"
   },
   "outputs": [],
   "source": [
    "#@title Import data { vertical-output: true }\n",
    "\n",
    "import_data_operation = client.import_data(\n",
    "    dataset=dataset,\n",
    "    bigquery_input_uri=bq_input_uri,\n",
    ")\n",
    "print('Dataset import operation: {}'.format(import_data_operation))\n",
    "\n",
    "# Synchronous check of operation status. Wait until import is done.\n",
    "import_data_operation.result()\n",
    "dataset = client.get_dataset(dataset_name=dataset.name)\n",
    "dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "_WLvyGIDe9ah"
   },
   "source": [
    "Importing this stockout datasets takes about 10 minutes. \n",
    "\n",
    "If you re-visit this Notebook, uncomment the following cell and run the command to retrieve your dataset. Replace `YOUR_DATASET_NAME` with its actual value obtained in the preceding cells.\n",
    "\n",
    "`YOUR_DATASET_NAME` is a string in the format of `'projects/<project_id>/locations/<location>/datasets/<dataset_id>'`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "P6NkRMyJfAGm"
   },
   "outputs": [],
   "source": [
    "# dataset_name = '<YOUR_DATASET_NAME>' #@param {type: 'string'}\n",
    "# dataset = client.get_dataset(dataset_name=dataset_name) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "QdxBI4s44ZRI"
   },
   "source": [
    "### Review the specs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "RC0PWKqH4jwr"
   },
   "source": [
    "Run the following command to see table specs such as row count."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "v2Vzq_gwXxo-"
   },
   "outputs": [],
   "source": [
    "# List table specs\n",
    "list_table_specs_response = client.list_table_specs(dataset=dataset)\n",
    "table_specs = [s for s in list_table_specs_response]\n",
    "\n",
    "# List column specs\n",
    "list_column_specs_response = client.list_column_specs(dataset=dataset)\n",
    "column_specs = {s.display_name: s for s in list_column_specs_response}\n",
    "\n",
    "# Print Features and data_type:\n",
    "\n",
    "features = [(key, data_types.TypeCode.Name(value.data_type.type_code)) for key, value in column_specs.items()]\n",
    "print('Feature list:\\n')\n",
    "for feature in features:\n",
    "    print(feature[0],':', feature[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type_counts = {}\n",
    "for column_spec in column_specs.values():\n",
    "  type_name = data_types.TypeCode.Name(column_spec.data_type.type_code)\n",
    "  type_counts[type_name] = type_counts.get(type_name, 0) + 1\n",
    "    \n",
    "plt.pie(x=type_counts.values(), labels=type_counts.keys(), autopct='%1.1f%%')\n",
    "plt.axis('equal')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Lqjq4X43v3ON"
   },
   "source": [
    "In the pie chart above, you see this dataset contains three variable types: `FLOAT64` (treated as `Numeric`), `CATEGORY` (treated as `Categorical`) and `STRING` (treated as `Text`). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "FNykW_YOYt6d"
   },
   "source": [
    "___"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kNRVJqVOL8h3"
   },
   "source": [
    "## 3. Update dataset: assign a label column and enable nullable columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "VsOPwxN9fOIl"
   },
   "source": [
    "### Get column specs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "-57gehId9PQ5"
   },
   "source": [
    "AutoML Tables automatically detects your data column type. \n",
    "\n",
    "There are a total of 120 columns in this stockout dataset.\n",
    "\n",
    "Run the following command to check the column data type that automaticallyed detected. If columns contains only numerical values, but they represent categories, change that column data type to caregorical by updating your schema.\n",
    "\n",
    "In addition, AutoML Tables detects `Stockout` to be categorical that chooses to run a classification model.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "jso_JBI9fgy6"
   },
   "outputs": [],
   "source": [
    "#@title Check column data type { vertical-output: true }\n",
    "\n",
    "# Print column data types.\n",
    "for column in column_specs:\n",
    "    print(column, '-', column_specs[column].data_type)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "iRqdQ7Xiq04x"
   },
   "source": [
    "### Update columns: make categorical\n",
    "\n",
    "From the column data type, you noticed `Item_Number`, `Category`, `Vendor_Number`, `Store_Number`, `Zip_Code` and `County_Number` have been autodetected as `FLOAT64` (Numerical) instead of `CATEGORY` (Categorical). \n",
    "\n",
    "In this solution, the columns `Item_Number`, `Category`, `Vendor_Number` and `Store_Number` are not nullable, but `Zip_Code` and `County_Number` can take null values.\n",
    "\n",
    "To change the data type, you can update the schema by updating the column spec."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "_xePITEYf5po"
   },
   "outputs": [],
   "source": [
    "# Update dataset\n",
    "categorical_column_names = ['Item_Number',\n",
    "                        'Category',\n",
    "                        'Vendor_Number',\n",
    "                        'Store_Number',\n",
    "                        'Zip_Code',\n",
    "                        'County_Number']\n",
    "is_nullable = [False, \n",
    "              False,\n",
    "              False,\n",
    "              False,\n",
    "              True,\n",
    "              True]\n",
    "\n",
    "for i in range(len(categorical_column_names)):\n",
    "    column_name = categorical_column_names[i]\n",
    "    nullable = is_nullable[i]\n",
    "    client.update_column_spec(\n",
    "        dataset=dataset,\n",
    "        column_spec_display_name=column_name,\n",
    "        type_code='CATEGORY',\n",
    "        nullable=nullable,\n",
    "    )\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "nDMH_chybe4w"
   },
   "source": [
    "### Update dataset: assign a label\n",
    "\n",
    "Select the target column and update the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "hVIruWg0u33t"
   },
   "outputs": [],
   "source": [
    "#@title Update dataset { vertical-output: true }\n",
    "\n",
    "target_column_name = 'Stockout' #@param {type: 'string'}\n",
    "update_dataset_response = client.set_target_column(\n",
    "    dataset=dataset,\n",
    "    column_spec_display_name=target_column_name,\n",
    ")\n",
    "update_dataset_response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "z23NITLrcxmi"
   },
   "source": [
    "___"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "FcKgvj1-Tbgj"
   },
   "source": [
    "## 4. Creating a model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Pnlk8vdQlO_k"
   },
   "source": [
    "### Train a model\n",
    "Training the model may take one hour or more. To obtain the results with less training time or budget, you can set [`train_budget_milli_node_hours`](https://cloud.google.com/automl-tables/docs/reference/rest/v1beta1/projects.locations.models), which is the train budget of creating this model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. \n",
    "\n",
    "For demonstration purpose, the following command sets the budget as 1 node hour. You can increate that number up to a maximum of 72 hours ('train_budget_milli_node_hours': 72000) for the best model performance. \n",
    "\n",
    "You can also select the objective to optimize your model training by setting `optimization_objective`. This solution optimizes the model by maximizing the Area Under the Precision-Recall (PR) Curve.  \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "11izNd6Fu37N"
   },
   "outputs": [],
   "source": [
    "#@title Create model { vertical-output: true }\n",
    "\n",
    "model_display_name = 'stockout_model' #@param {type:'string'}\n",
    "\n",
    "create_model_response = client.create_model(\n",
    "    model_display_name,\n",
    "    dataset=dataset,\n",
    "    train_budget_milli_node_hours=1000,\n",
    "    optimization_objective='MAXIMIZE_AU_PRC',\n",
    ")\n",
    "print('Create model operation: {}'.format(create_model_response.operation))\n",
    "# Wait until model training is done.\n",
    "model = create_model_response.result()\n",
    "model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "neYjToB36q9E"
   },
   "source": [
    "If your Colab times out, use `client.list_models()` to check whether your model has been created. \n",
    "\n",
    "Then uncomment the following cell and run the command to retrieve your model. Replace `YOUR_MODEL_NAME` with its actual value obtained in the preceding cell.\n",
    "\n",
    "`YOUR_MODEL_NAME` is a string in the format of `'projects/<project_id>/locations/<location>/models/<model_id>'`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "QptCwUIK7yhU"
   },
   "outputs": [],
   "source": [
    "# model_name = '<YOUR_MODEL_NAME>' #@param {type: 'string'}\n",
    "# model = client.get_model(model_name=model_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "1wS1is9IY5nK"
   },
   "source": [
    "___"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "TarOq84-GXch"
   },
   "source": [
    "## 5. Batch prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Soy5OB8Wbp_R"
   },
   "source": [
    "### Initialize prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "39bIGjIlau5a"
   },
   "source": [
    "Your data source for batch prediction can be GCS or BigQuery. For this solution, you will use a BigQuery Table as the input source. The URI for your table is in the format of `bq://PROJECT_ID.DATASET_ID.TABLE_ID`.\n",
    "\n",
    "To write out the predictions, you need to specify a GCS bucket `gs://BUCKET_NAME`.\n",
    "\n",
    "The AutoML Tables logs the errors in the `errors.csv` file.\n",
    "\n",
    "**NOTE:** The batch prediction output file(s) will be updated to the GCS bucket that you set in the preceding cells."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "gkF3bH0qu4DU"
   },
   "outputs": [],
   "source": [
    "#@title Start batch prediction { vertical-output: true, output-height: 200 }\n",
    "\n",
    "batch_predict_bq_input_uri = 'bq://product-stockout.product_stockout.batch_prediction_inputs'\n",
    "batch_predict_gcs_output_uri_prefix = 'gs://<BUCKET_NAME>'  #@param {type:'string'}\n",
    "\n",
    "batch_predict_response = client.batch_predict(\n",
    "    model=model, \n",
    "    biqquery_input_uris=batch_predict_bq_input_uri,\n",
    "    gcs_output_uri_prefix=batch_predict_gcs_output_uri_prefix,\n",
    ")\n",
    "print('Batch prediction operation: {}'.format(batch_predict_response.operation))\n",
    "# Wait until batch prediction is done.\n",
    "batch_predict_result = batch_predict_response.result()\n",
    "batch_predict_response.metadata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "kgwbJwS2iLpc"
   },
   "outputs": [],
   "source": [
    "#@title Check prediction results { vertical-output: true }\n",
    "\n",
    "gcs_output_directory = batch_predict_response.metadata.batch_predict_details.output_info.gcs_output_directory\n",
    "result_file = gcs_output_directory + 'tables_1.csv'\n",
    "print('Batch prediction results are stored as: {}'.format(result_file))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Clean up\n",
    "\n",
    "To clean up all GCP resources used in this notebook, you can [delete the GCP\n",
    "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects)."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "retail_product_stockout_prediction.ipynb",
   "provenance": [],
   "version": "0.3.2"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
